Towards Theoretical Foundations of Clustering
نویسنده
چکیده
Clustering is a central unsupervised learning task with a wide variety of applications. Unlike in supervised learning, different clustering algorithms may yield dramatically different outputs for the same input sets. As such, the choice of algorithm is crucial. When selecting a clustering algorithm, users tend to focus on cost-related considerations, such as running times, software purchasing costs, etc. Yet differences concerning the output of the algorithms are a more primal consideration. We propose an approach for selecting clustering algorithms based on differences in their input-output behaviour. This approach relies on identifying significant properties of clustering algorithms and classifying algorithms based on the properties that they satisfy. We begin with Kleinberg’s impossibility result, which relies on concise abstract properties that are well-suited for our approach. Kleinberg showed that three specific properties cannot be satisfied by the same algorithm. We illustrate that the impossibility result is a consequence of the formalism used, proving that these properties can be formulated without leading to inconsistency in the context of clustering quality measures or algorithms whose input requires the number of clusters. Combining Kleinberg’s properties with newly proposed ones, we provide an extensive property-base classification of common clustering paradigms. We use some of these properties to provide a novel characterization of the class of linkage-based algorithms. That is, we distil a small set of properties that uniquely identify this family of algorithms. Lastly, we investigate how the output of algorithms is affected by the addition of small, potentially adversarial, sets of points. We prove that given clusterable input, the output of k-means is robust to the addition of a small number of data points. On the other hand, clusterings produced by many well-known methods, including linkage-based techniques, can be changed radically by adding a small number of elements.
منابع مشابه
Towards Theoretical Foundations of Clustering: Thesis Highlights
Clustering is a central unsupervised learning task with a wide variety of applications. However, in spite of its popularity, it lacks a unified theoretical foundation. Recently, there has been work aimed at developing such a theory. We discuss recent advances in clustering theory, including axiomatizing clustering and providing formal guidance for clustering algorithm selection. This paper pres...
متن کاملA Theoretical Study of Clusterability and Clustering Quality
Clustering is a widely used technique, with applications ranging from data mining, bioinformatics and image analysis to marketing, psychology, and city planning. Despite the practical importance of clustering, there is very limited theoretical analysis of the topic. We make a step towards building theoretical foundations for clustering by carrying out an abstract analysis of two central concept...
متن کاملRecent developments in clustering algorithms
In this paper, we give a short review of recent developments in clustering. We shortly summarize important clustering paradigms before addressing important topics including metric adaptation in clustering, dealing with non-Euclidean data or large data sets, clustering evaluation, and learning theoretical foundations.
متن کاملEffectiveness of Teaching General Courses on Theoretical Foundations of Islam on Religious Identity and Academic Resilience of Sports Science Students at Ilam University
Background and objectives: Promoting the theoretical foundations of Islam has a role in determining the individual and social behavior of students. Therefore, this study aimed to investigate the effect of teaching general courses on the theoretical foundations of Islam on the religious identity and academic resilience of Sports Science students at Ilam University. Materials and Methods: The me...
متن کاملCultural and Demographic Foundations of Social Trust in Iran
This study primarily aims to examine the cultural and demographic foundations of social trust. The research findings presented and discussed in this paper are based on a survey that includes a total sample of 5200 males and females residing in varying rural and urban areas across Iran. In order to examine social trust more appropriately, it has been classified into three main domains: trust tow...
متن کاملAlwis - A Visualization Tool for Concept Based Retrieval Schemes - Theoretical Foundations and Models
Comparing and evaluating the performance of concept based retrieval schemes is notoriously difficult and of increasing practical importance. To aid the evaluation process we developed a visualization tool which puts particular emphasis on the comparison of single queries. This tool directly visualizes a generic theoretical model for concept based retrieval schemes which is presented in this pap...
متن کامل